Reconciling Attribute Values from Multiple Data Sources
نویسندگان
چکیده
Because of the heterogeneous nature of multiple data sources, data integration is often one of the most challenging tasks of today’s information systems. While the existing literature has focused on problems such as schema integration and entity identification, our current study attempts to answer a basic question: When an attribute value for a real-world entity is recorded differently in two databases, how should the “best” value be chosen from the set of possible values? We first show how probabilities for attribute values can be derived, and then propose a framework for deciding the cost-minimizing value based on the total cost of type I, type II, and misrepresentation errors.
منابع مشابه
A Framework for Reconciling Attribute Values from Multiple Data Sources
B of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different dat...
متن کاملConversion Rules from Disparate Data Sources
The successful integration of data from autonomous and heterogeneous systems calls for the resolution of semantic conflicts that may be present. Such conflicts are often reflected by discrepancies in attribute values of the same data object. In this paper, we describe a recently developed prototype system, DIRECT (DIscovering and REconciling ConflicTs). The system mines data value conversion ru...
متن کاملReconciling Continuous Attribute Values from Multiple Data Sources
Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. The challenges exist at three different levels: schema heterogeneity, entity heterogeneity, and data heterogeneity. The existing literature has largely focused on schema heterogeneity and entity heterogeneity; and the very limited wor...
متن کاملElectronic Companion — “ A Framework for Reconciling Attribute Values from Multiple Data Sources
Proof of Proposition 1. Suppose an attribute value ai is not recorded in any of the data sources S1 through Sn for an entity instance. Then, from Assumptions 1 and 2 in the paper, we have P A= ai AS1 = ak AS2 = al ASn = atW i = k i = l i = t = P AS1 = ak A= ai P AS2 = al A= ai × · · ·×P ASn = at A= ai P AS1 = ak AS2 = al ASn = at P A= ai = 1−R A S1 / m− 1 1−RS2 / m− 1 × · · ·× 1−RSn / m− 1 P AS...
متن کاملAttribute Classification Using Feature Analysis
The basis of many systems that integrate data from multiple sources is a set of correspondences between source schemata and a target schema. Correspondences express a relationship between sets of source attributes, possibly from multiple sources, and a set of target attributes. Clio is an integration tool that assists users in de ning value correspondences between attributes [1]. In real life s...
متن کامل